Improving the effectiveness of software prefetching with adaptive executions
نویسندگان
چکیده
The effectiveness of software prefetching for tolerating latency depends mainly on the ability of programmers and/or compilers to: 1) predict in advance the magnitude of the run-time remote memory latency, and 2) insert prefetches at a distance that minimizes stall time without causing cache pollution. Scalable heterogeneous multiprocessors, such as network of computers (NOWs), present special challenges to static software prefetching because on these systems the network topology and node configuration are not completely determined at compile time. Furthermore, dynamic software prefetching cannot do much better because individual nodes on heterogeneous large NOWs would tend to experience different remote memory delays over time. A fixed prefetch distance, even when computed at run-time, cannot perform well for the whole duration of a software pipeline. Here we present an adaptive scheme for software prefetching that makes it possible for nodes to dynamically change, not only the amount of prefetching, but the prefetch distance as well. Doing this makes it possible to tailor the execution of software pipeline to the previaling conditions affecting each node. We show how simple performance data collected by hardware monitors can allow programs to observe, evaluate and change their prefetching policies. Our results show that on the benchmarks we simulated adaptive prefetching was capable of improving performance over static and dynamic prefetching by 10% to 60%. More important, future increases in the heterogeneity and size of NOWs will increase the advantages of adaptive prefetching over static and dynamic schemes.
منابع مشابه
Implicit Acceleration of Critical Sections via Unsuccessful Speculation
The speculative execution of critical sections, whether done using HTM via the transactional lock elision pattern or using a software solution such as STM or a sequence lock, has the potential to improve software performance with minimal programmer effort. The technique improves performance by allowing critical sections to proceed in parallel as long as they do not conflict at run time. In this...
متن کاملUnobtrusive Reactive Prefetching: A Multicore Approach for Exploiting Hot Streams in Cache Misses
Processor performance continues to outpace memory performance by a large margin. One approach for mitigating this gap is to employ software-based speculative prefetching. Software dynamic prefetchers are able to identify patterns more complex than those of hardware prefetchers while retaining the ability to respond to a programs dynamic behavior; however modern techniques incur prohibitively hi...
متن کاملHardware and software cache prefetching techniques for MPEG benchmarks
With the popularity of multimedia acceleration instructions such as MMX, MPEG decompression is increasingly executed on general purpose processors instead of dedicated MPEG hardware. The gap between processor speed and memory access means that a significant amount of time is spent in the memory system. As processors get faster—both in terms of higher clock speeds and increased instruction level...
متن کاملOptimizing Performance in Highly Utilized Multicores with Intelligent Prefetching
Khan, M. 2016. Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1335. 54 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9450-6. Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefetching, to increase pe...
متن کاملThe Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems
Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques for three types of applications: regular scientific codes, irregular scie...
متن کامل